Telegram Group & Telegram Channel
想着五一前后挑战llama4 400B发现并没什么挑战性,q4模型用7970X (150GB/s)纯CPU prefll 108 t/s decode 13.8 t/s,用8G显存offload dense层27 t/s,塞满双卡96G显存能30.8 t/s

不过llama.cpp的override tensors的prefill看起来是用纯GPU走PCIe访问内存里的模型,还有优化空间。至少不应该比纯CPU差



tg-me.com/david_random/574
Create:
Last Update:

想着五一前后挑战llama4 400B发现并没什么挑战性,q4模型用7970X (150GB/s)纯CPU prefll 108 t/s decode 13.8 t/s,用8G显存offload dense层27 t/s,塞满双卡96G显存能30.8 t/s

不过llama.cpp的override tensors的prefill看起来是用纯GPU走PCIe访问内存里的模型,还有优化空间。至少不应该比纯CPU差

BY David's random thoughts




Share with your friend now:
tg-me.com/david_random/574

View MORE
Open in Telegram


David& 39;s random thoughts Telegram | DID YOU KNOW?

Date: |

China’s stock markets are some of the largest in the world, with total market capitalization reaching RMB 79 trillion (US$12.2 trillion) in 2020. China’s stock markets are seen as a crucial tool for driving economic growth, in particular for financing the country’s rapidly growing high-tech sectors.Although traditionally closed off to overseas investors, China’s financial markets have gradually been loosening restrictions over the past couple of decades. At the same time, reforms have sought to make it easier for Chinese companies to list on onshore stock exchanges, and new programs have been launched in attempts to lure some of China’s most coveted overseas-listed companies back to the country.

David& 39;s random thoughts from it


Telegram David's random thoughts
FROM USA